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(54) Method of encoding digital audio signals 

(57) The method of encoding digital data in the 
present invention enables change of minimum limit of 
audibility characteristics and/or masking characteristics, 
which are usually set on the basis of the aural-psycho- 
logical characteristics of persons with typical hearing. 



thus changing the allocation of quantized bits to each 
frequency band and allowing selection of a sound qual- 
ity which accords with the listener's hearing. The 
present invention is suitable for ATRAC, a method for 
compressed encoding for mini-discs. 
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wave signal, the number of bits allocated to the fre- 
quency bands f1 which include the sine wave signal 
becomes relatively smaller, the quantization error of the 
sine wave signal becomes greater, and sound quality 
deteriorates. 5 

In regard to this point, the present Applicant has 
proposed, in Japanese Unexamined Patent Publication 
7-202823/1995, a structure which automatically limits 
the number of bits which may be allocated to frequency 
bands with low power S. However, a drawback of this to 
conventional art is that, since the maximum number of 
bits which may be allocated to each frequency band is 
determined on the basis of its power, when the power of 
white noise is large, there are cases when no limitation 
on bit allocation to that frequency band is made. is 

SUMMARY OF THE INVENTION 

One object of the present invention is to provide a 
method of encoding digital data capable of attaining a 20 
sound quality which accords with the listener's hearing. 

Another object of the present invention is to provide 
a method of encoding digital data capable of preventing 
deterioration of sound quality even of signals with nar- 
row spectrum bands. 25 

In order to realize the first object mentioned above, 
the first method of encoding digital data of the present 
invention encodes digital data such as musical tones 
and sounds by converting it into frequency domains, 
dividing the converted spectra into a plurality of fre- 30 
quency bands, changing a minimum limit of audibility 
characteristic so as to set a masking threshold, and allo- 
cating quantized bits for each frequency band in accord- 
ance with ratios of masking threshold to noise which are 
found for each frequency band in accordance with 35 
power or energy of each frequency band in considera- 
tion of aural-psychological characteristics. 

The above structure, by enabling change of the 
minimum limit of audibility characteristic among aural- 
psychological characteristics, frees aural-psychological 40 
characteristics from definition by the characteristics of 
persons with typical hearing, and makes possible selec- 
tion of whether or not to allocate bits to spectra with 
small inaudible domains, or spectra with ultra-low or 
ultra-high domains. Accordingly, it becomes possible to 45 
respond to persons with superior hearing or to individ- 
ual, subjective preference, and sound quality which 
accords with listeners' hearing can be attained. 

Next, in order to realize the first object mentioned 
above, the second method of encoding digital data of so 
the present invention encodes digital data such as musi- 
cal tones and sounds by converting it into frequency 
domains, dividing the converted spectra into a plurality 
of frequency bands, changing a masking characteristic 
so as to set a masking threshold, and allocating quan- 55 
tized bits for each frequency band in accordance with 
ratios of the masking threshold to noise for each fre- 
- quency band which are found in accordance with power 



or energy of each frequency band in consideration of 
aural-psychological characteristics. 

The above structure, by enabling change of the 
masking characteristic among the aural-psychological 
characteristics, frees aural-psychological characteris- 
tics from definition by the characteristics of persons with 
typical hearing, and makes possble selection of 
whether to allocate bits, for example, to spectra which, 
for example, suffer masking in a critical band. Accord- 
ingly, it becomes possible to respond to persons with 
superior hearing or to individual, subjective preference, 
and sound quality which accords with listeners' hearing 
can be attained. 

Next, in order to realize the first object mentioned 
above, the third method of encoding digital data of the 
present invention encodes digital data such as musical 
tones and sounds by converting it into frequency 
domains, dividing the converted spectra into a plurality 
of frequency bands, and switching among (i) bit alloca- 
tion in accordance with ratios of masking threshold to 
noise which are found for each frequency band in 
accordance with power or energy of each frequency 
band in consideration of aural-psychological character- 
istics, (ii) bit allocation in accordance with a representa- 
tive value of the power or the energy of each frequency 
band, and (iii) bit allocation giving weight to each of the 
foregoing bit allocation methods. 

With respect to data, such as white noise having a 
spectral composition which is comparatively flat, the 
above structure makes possible bit allocation which is 
fiat along the frequency axis. Again, with respect to 
data, such as sine wave signals, with narrow band 
width, the above structure makes possible bit aDocation 
which emphasizes the signal with narrow band width. 
Accordingly, selection of a sound quality which is suited 
to the source of the musical tone is made possfcle. 

Finally, the fourth method of encocfing digital data of 
the present invention, in order to realize the second 
object mentioned above, switches among bit allocation 
methods (Q, (a), and (iii) described in the third method of 
encoding digital data in accordance with a relationship 
between the masking threshold and peaks and local 
peaks found based on differences in power or energy 
between adjacent spectra within each frequency band. 

The above structure makes it possible to automati- 
cally allocate bits according to the method most suited 
to the digital data, whether it is white noise or other data 
with wide band width, or sine wave signals or other data 
with narrow band width, thus preventing deterioration of 
sound quality, even with musical tones not suited to bit 
allocation using simultaneous masking such as the 
masking threshold/noise ratio. 

The other objects, features, and superior points of 
the present invention will be made dear by the descrip- 
tion below. Further, the advantages of this invention will 
be evident from the following explanation in reference to 
the Figures. 
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control microcomputer 36. In association with the sys- 
tem control microcomputer 36 is provided an input oper- 
ation means, which enables sound-quality selection 
operations, which will be discussed below, as wed as 
song title input, song selection operations, etc. 

Next, the bit allocation method in the first embodi- 
ment of the present invention, which is performed 
according to the ATRAC method by the audio compres- 
sion circuit 6 of the mini-disc recording and reproduction 
device 1 structured as described above, will be 
explained, referring to Figures 1 and 3. 

In the ATRAC method, the audio data sampled at 
44.1 kHz, as mentioned above, is divided into certain 
frequency bands, specifically a Low frequency band 
from 0 kHz to 5.5 kHz, a Middle frequency band from 
5.5 kHz to 1 1 kHz, and a High frequency band from 1 1 
kHz to 22 kHz, and the audio data bridging certain time 
frames for each divided frequency band is converted, by 
means of the MDCT processing, into an MDCT coeffi- 
cient which is the data of one frequency domain. The 
MDCT coefficients converted in this manner are then 
converted into spectrum powers Si for i number of fre- 
quency bands (i « 1 , 2 I, with I equal to, for example, 

25). Processing like that shown in Figure 3 is then car- 
ried out to allocate quantized bits in accordance with 
each spectrum power Si thus obtained. 

The audio compression circuit 6 includes a table 
ROM 6a, and in the table ROM 6a are stored masking 
characteristics and/or minimum limit of audibility char- 
acteristics according to the ATRAC method. These min- 
imum limit of audibility characteristics appear as a curve 
shown by reference symbols a1 , a2, a3 t and a4 on Fig- 
ure 1. The masking characteristics, calculated in 
accordance with the spectrum powers Si, a critical band 
width of each frequency band, etc., appear, for a power 
distrtoution like that shown in Figure 1, for example, as 
a curve shown by reference symbols a11, a12, and 
a13. The minimum limit of audibility characteristics 
shown by the reference symbols al through a4 and 
masking characteristics shown by reference symbols 
a11 through a13 are prepared in accordance with the 
aural-psychological characteristics of persons with typi- 
cal hearing characteristics, and are fixed characteris- 
tics. 

However, in the first embodiment of the present 
invention, the minimum limit of audfoilrty and/or the 
masking characteristics can be changed. In concrete 
terms, for example in the case of the masking character- 
istics, the greater the spectrum power and the higher 
the frequency, the larger the range of masking of other 
frequency bands. In the example in Figure 1, the maxi- 
mum limit Smax of the range influenced by spectrum 
power S5, which is a peak power, is shown by 
a 1 3 x (1 ±Ek) . Here. Ik is a coefficient for weighting, tf 
a plurality of variables k are stored in advance in the 
table ROM 6a, and the variables k are switched by 
means of a register 36a in the system control microcom- 
puter 36, the masking characteristic curve a13 can be 
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changed within the range from a14 through a15. The 
variable k can be set by the listener through the input 
operating means 37. 

For example, by changing the masking characteris- 

5 tic curve from a13 to a14, the band masked is widened, 
the level of masking is increased, and the number of bits 
allocated to signals with low power is decreased, or 
even eliminated. Accordingly, bit allocation to signals of 
relatively greater power is increased, and the dynamic 

w range of the high-power signals is increased. K, on the 
other hand, the masking characteristic curve is changed 
from a13 to a15, bit allocation to low-power signals is 
increased, and bit allocation to signals of relatively 
greater power is decreased. Accordingly, the frequency 

15 range can be enlarged. The same effect can also be 
obtained by giving the masking characteristic curve a13 
an offset instead of weighting. 

In the same way, with regard to the minimum limit of 
audibility characteristics, the minimum limit of audfoilrty 

20 characteristic curve a1 through a4, which is based on 
the aural-psychological characteristics of persons with 
typical hearing characteristics, can be weighted or given 
an offset, thereby changing the a4 portion of the curve, 
for example, as shown by reference symbol <x5. In this 

25 way. relatively more bits are allocated to the high-fre- 
quency bands. 

Next, processing for allocation of quantized bits will 
be explained, referring to Figure 3. First in Step p1. the 
spectrum power Si of each frequency band is calculated 

30 from the sum of squares of the MDCT coefficients for 
that frequency band (which are obtained by means of 
the MDCT processing). In Step p2, the aucfio compres- 
sion circuit 6 selects, through the register 36a of the 
system control microcomputer 36, parameters for 

35 change of masking characteristics, such as the varia- 
bles k, which are stored in the table ROM 6a In Step p3, 
in the same way as in Step p2, parameters for change 
of the minimum limit of audibility characteristics are 
selected. 

40 In Step p4, reference masking characteristics and 
minimum limit of audibility characteristics previously cal- 
culated and stored in the table ROM 6a are changed in 
accordance with the parameters selected in Steps p2 
and p3, and these two characteristics are synthesized in 

45 order to determine a final masking threshold. In other 
words, if the minimum limit of audfoSrty characteristic 
curve thus changed is as shown by the reference sym- 
bols ct1, a2, a3, a5, and the masking characteristic 
curve thus changed is as shown by the reference sym- 

50 bols alt, a12, a14, the curve of the final masking 
threshold obtained by synthesis will be as shown by the 
reference symbols at . a12, a14, a3. a5. 

In Step p5, if the index of each frequency band is i, 
the ratio of the frequency band's spectrum power Si 

55 (calculated in Step p1) to its masking threshold Mi (cal- 
culated in Step p4) SMRi = Si/Mi is calculated for all 
frequency bands. On a logarithmic graph, the ratio 
SMRi for each frequency band will correspond to that 
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cases where rt resembles a single sine wave, for exam- 
ple with a solo piano piece, if the bit allocation is per- 
formed only according to the ratio of masking threshold 
to quantized noise MNRi(n), many bits will be allocated 
to noise elements with low power, and the error in quan- 5 
tizing of the piano becomes relatively great. However, if 
the bit allocation percentage x can be changed as out- 
lined above, the bit allocation according to the power of 
quantized noise SNi(n) is carried out in addition to that 
according to the ratio of masking threshold to quantized jo 
noise MNRi(n), thereby ensuring that the number of bits 
allocated to the piano can be increased, the error in 
quantizing of the piano is reduced. 

Again, if the input signal is composed of sound with 
many local peaks and noise, for example an orchestra is 
piece, the bit allocation can be performed in accordance 
with the ratio of masking threshold to quantized noise 
MNRi(n) ( in which the noise and the musical tones com- 
posing small local peaks in bands close to large signals 
can be masked, thus allocating no bits to them, and 20 
more bits can be allocated to large signals which are not 
masked. This enables high fidelity recording. 

Further, with input signals lying between the forego- 
ing two examples, which are composed of a musical 
tone with three or four local peaks and noise, for exam- 25 
pie a solo clarinet piece, by giving weight both to the bit 
allocation according to the ratio of masking threshold to 
quantized noise MNRi(n) and to the bit allocation 
according to the power of quantized noise SNi(n), fidel- 
ity of the clarinet can be improved. 30 

In this way, the bit allocation method most suited to 
any musical tone source can be selected. 

The third embodiment of the present invention will 
next be explained, in reference to Figures 5 and 6. 

Figure 5 is a flow chart for explaining the bit alloca- 35 
tion method in the third embodiment of the present 
invention. The notable feature of this bit allocation 
method is that the percentage x of (a) the bit allocation 
according to the ratio of masking threshold to quantized 
noise MNRi(n) to (b) the bit allocation according to the 40 
power of quantized noise SNi(n) Is automatically deter- 
mined on the basis of the relationship between (1) 
peaks and local peaks in spectrum powers Si and (2) 
masking thresholds. 

First, the peak value among the spectrum powers 45 
of all frequency bands from S1 to SI, such as that shown 
by reference symbol S5 on Figure 6, is found. Then a 
masking threshold, such as that shown on Figure 6, 
which includes masking characteristics due to that peak 
level, is found. Next local peaks such as that shown by so 
reference symbol S8 on Figure 6 are found for each fre- 
quency band. The number of such local peaks masked 
by the masking threshold, and the number of such local 
peaks not so masked, are respectively found, and the 
ratio between masked local peaks and unmasked local 55 
peaks determines the percentage x. 

In other words, if the total number of local peaks is 
NM, and the number of masked local peaks is M, then: 



M/(NM+1) = 0 (1) 

Accordingly, H there are no masked local peaks, the per- 
centage x will be 0%. and the number of bits available 
for the first allocation B1 will be set at 0. If, on the other 
hand, 

0 < M/(NM+1) < 0.5 (2) 

then x is from 50% to 90%, and if 

0.5<M/(NM+1) (3) 

then x is 100%, and the number of bits available for the 
first allocation B1 will be the entirety of total available 
bits B0. 

Here the detection of local peaks and the selection 
of the percentage x will be discussed. The local peaks 
are found for all the frequency bands after the peak 
spectrum power (S5 in Figure 6) is found. In the exam- 
ple in Figure 6, the difference D34, EMS D89 and its 

polarity between each spectrum power S3 to S9 within 
a certain number of frequency bands from peak value 
S5 (in the example in Figure 6, two frequency bands on 
the low-frequency side and four on the high-frequency 
side) is found, and the local peaks are detected on the 
basis of change in polarity and the absolute value of 
those differences. In this way, the local peaks are found 
across all frequency bands. In concrete terms, in the 
case of Figure 6, there is only one local peak (S8), and 
that local peak is masked by the masking threshold; 
thus M/(NM+1) = 1/(1+1) = 0.5. Accordingly, equation 
(2) above will be applied, and a percentage x = 50% to 
90% will be selected. 

Next, the bit allocation method of the third embodi- 
ment will be explained in reference to Figure 5. 

In this bit allocation method, after the spectrum 
power Si of each frequency band has been calculated in 
Step p21 as in Step p11 and Step p1 above, the peak 
value is found in Step p22, and the masking threshold 
including the masking characteristics of that peak value 
is found in Step p23. In Step p24, the percentage x is 
calculated by means of equations (1) through (3) above, 
and the number of bits available for the first allocation 
B1 is calculated. Ttien, in Steps p25 through p27, as in 
Steps p15 through p17 above, the first bit allocation, 
according to the ratio of masking threshold to quantized 
noise MNRi(n), is performed, and then in Steps p28 and 
p29, as in Steps p1 8 and p19 above, the second bit allo- 
cation, according to the power of quantized noise 
SNi(n). is performed. 

In this way, a bit allocation with high sound quality 
appropriate to the musical tones like that shown in Fig- 
ure 4 can be performed automatically, and deterioration 
of sound quality, even with respect to musical tones not 
suited to the bit allocation according to the ratio of 
masking threshold to quantized noise MNRi(n), can be 
prevented. 
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each frequency band; 

wherein said steps (i) and (it) are per- 
formed in accordance with the percentage so 
as to allocate the total number of quantized s 
bits, thereby allocating the number of the quan- 
tized bits of each frequency band. 

8. The method of encoding digital data according to 
Claim 7, wherein: to 

the percentage is determined in accordance 
with a relationship between the masking 
threshold and peaks and local peaks found 
based on differences in power or energy is 
between adjacent spectra within each fre- 
quency band. 

9. The method of encoding digital data according to 
Claim 7, wherein: 20 

the percentage corresponds, if NM is the total 
number of the local peaks, with a ratio of the 
number M of the local peaks which are masked 
by the masking threshold to the number N of 25 
the local peaks which are not masked by the 
masking threshold. 

10. The method of encoding digital data according to 
Claim 9, wherein: 30 

the percentage is set to 0% when 
M/(NM+1) = 0 is satisfied; and 
the percentage is set to 100% when 
0.5 < M/(NM+1) is satisfied. 35 

11. A method of encoding digital data in which, in con- 
verting digital data such as musical tones and 
sounds into frequency domains, dividing the con- 
verted spectra into a plurality of frequency bands, 40 
and allocating bits for each frequency band so as to 
encode the digital data, the allocating of bits is per- 
formed in accordance with a ratio of masking 
threshold to noise of each frequency band in con- 
sideration of aural-psychological characteristics, 45 
the ratio being calculated for each frequency band 

in accordance with power or energy of each fre- 
quency band, wherein: 

minimum limit of audibility characteristics that so 
have been previously found are changeable. 

12. A method of encoding digital data in which, in con- 
verting digital data such as musical tones and 
sounds into frequency domains, dividing the con- ss 
verted spectra into a plurality of frequency bands, 
and allocating bits for each frequency band so as to 
encode the digital data, the allocating of bits is per- 



formed in accordance with a ratio of masking 
threshold to noise of each frequency band in con- 
sideration of aural-psychological characteristics, 
the ratio being calculated for each frequency band 
in accordance with power or energy of each fre- 
quency band, wherein: 

masking characteristics that have been previ- 
ously found in accordance with the power or 
the energy are changeable. 

13, A method of encoding dgrtal in which digital data 
such as musical tones and sounds is converted into 
frequency domains, the converted spectra are 
divided into a plurality of frequency bands, and bits 
are allocated for each frequency band so as to per- 
form the encoding, said method comprising the 
steps of: 

(i) performing a first allocation of the quantized 
bits in accordance with ratios of masking 
threshold to noise which are found for each fre- 
quency band in accordance with power or 
energy of each frequency band; 

(ii) performing a second allocation of the quan- 
tized bits in accordance with a representative 
value of the power or the energy of each fre- 
quency band; and 

(iii) performing a third allocation of the quan- 
tized bits giving weight to the bit allocation 
methods of each of said steps (i) and (ii); 

wherein said steps (i), (ii), and (iii) are 
switchable. 

14. The method of encoding digital data according to 
Claim 13, wherein: 

said steps (i), (ii), and (iii) are switched in 
accordance with a relationship between the 
masking threshold and peaks and local peaks 
found based on differences in power or energy 
between adjacent spectra within each fre- 
quency band. 
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